Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove maybeHead from BackwardSyncContext #5497

Merged
merged 16 commits into from
Jun 14, 2023

Conversation

siladu
Copy link
Contributor

@siladu siladu commented May 25, 2023

Summary of Changes

  • Remove what appears to be a redundant global variable, maybeHead. When backwards sync is performed, this can cause besu to incorrectly jump back to a much older block instead of a recent block.
  • Add logging to help debug these issues

Context

This is a partial fix for issues noticed on recent nimbus-besu combos: #5411
Note, there are still issues due to nimbus behaviour, so there is instability when restarting Nimbus for a period of about an hour where it gets in sync but falls out again repeatedly.
It should also help with any CL that causes a backwards sync in Besu.

It should prevent the issue where we see large "reorgs" in the logs e.g.:

{"@timestamp":"2023-04-27T00:47:24,241","level":"WARN","thread":"vert.x-worker-thread-0","class":"DefaultBlockchain","message":"Chain Reorganization +192 new / -0 old
       Old - hash: 0xbca29e861aabf1b975cde3ee6f147a86ad8de3bfab1a39bfb2d98594a72eb63d, height: 3368145
       New - hash: 0xd1fb6c37bf03b2ed56bfd07699e9097119f143b600f4a33e4010edee3261a75a, height: 3368337
  Ancestor - hash: 0xbca29e861aabf1b975cde3ee6f147a86ad8de3bfab1a39bfb2d98594a72eb63d, height: 3368145","throwable":""}
{"@timestamp":"2023-04-27T00:47:24,670","level":"WARN","thread":"nioEventLoopGroup-3-9","class":"DefaultBlockchain","message":"Chain Reorganization +0 new / -192 old
       Old - hash: 0xd1fb6c37bf03b2ed56bfd07699e9097119f143b600f4a33e4010edee3261a75a, height: 3368337
       New - hash: 0xbca29e861aabf1b975cde3ee6f147a86ad8de3bfab1a39bfb2d98594a72eb63d, height: 3368145
  Ancestor - hash: 0xbca29e861aabf1b975cde3ee6f147a86ad8de3bfab1a39bfb2d98594a72eb63d, height: 3368145","throwable":""}

because instead of rewinding back to an incorrectly maintained maybeHead hash, it will rewind back to the hash that was given to it in the FCU or newPayload.

Note, we still see these reorgs when Prysm lock-steps syncs in batches of ~50 newPayloads

Testing

This goal of this testing was to ensure the node could get back into sync in each scenario (and also explore the various CL syncing strategies but that's not relevant for this PR). Testing was done on Sepolia, unless otherwise mentioned

  • Initial Sync (performs backwards sync as last stage)
  • Scenario 1: CL restart
  • Scenario 2: CL short downtime: 30 - 120 mins
  • Scenario 3: CL long downtime: 7 hours
  • Scenario 4: Besu short downtime: 30 - 60 mins
  • Scenario 5: Besu long downtime: 3 - 5 hours
  • Scenario 6: Besu longer downtime: 20 hours
Scenario/CL Nimbus Teku Prysm Lighthouse Lodestar Nimbus Mainnet Teku Mainnet
Initial Sync
Scenario 1
Scenario 2
Scenario 3 ❓*
Scenario 4
Scenario 5
Scenario 6
  • Prysm Scenario 3 was a 43 hour downtime. With Prysm's lock-step it is a slow recovery (not actually backwards sync anyway so not related to this fix)...it was quicker to recover by resyncing prysm.

Impact

Scenario 1 with Nimbus (the main driver for this fix) before and after the fix:

BEFORE
nimbus-without-fix

AFTER
nimbus-with-fix

After being restarted, Nimbus on Sepolia takes some time to find peers, so it is maybe getting more towards Scenario 2. No lock-step is attempted, instead Nimbus causes a few small backward syncs with some halts in between.
The main difference with before is that besu is only backward syncing a few blocks rather than many hundreds due to the Besu bug.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
It's usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@github-actions
Copy link

github-actions bot commented May 25, 2023

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I have considered running ./gradlew acceptanceTestNonMainnet locally if my PR affects non-mainnet modules.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

@siladu siladu changed the title Remove maybe head Remove maybeHead from BackwardSyncContext May 25, 2023
@siladu siladu added the TeamGroot GH issues worked on by Groot Team label May 29, 2023
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu siladu marked this pull request as ready for review June 1, 2023 06:08
@siladu siladu requested a review from fab-10 June 1, 2023 06:11
maybeHead = Optional.empty();
} else {
maybeHead = Optional.of(headHash);
public synchronized void maybeUpdateTargetHeight(final Hash headHash) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename method to reflect what is still does after removing maybeHead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong this is redundant, and the method can be removed, since the update of the target height is done in the syncBackwardsUntil method, that is called just after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will consider removing this and possiblyMoveHead in a subsequent PR. I want to study the code a bit more and don't want to invalidate this PR's testing.

if (blockchain.contains(head)) {
LOG.debug("Changing head to {}", head);
blockchain.rewindToBlock(head);
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) {
Copy link
Contributor Author

@siladu siladu Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of maybeHead, I am checking the passed in hash (which should originate from the FCU). I'm not sure if this check is useful at all now though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also removed the blockchain.contains(hash) check. Any reason for that? Could possiblyMoveHead be called before we save the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was only acting on maybeHead and that is now removed. There wasn't an equivalent check for lastSavedBlock.

possiblyMoveHead is currently only called immediately after appending the block to the chain

this.getProtocolContext()
.getBlockchain()
.appendBlock(block, optResult.getYield().get().getReceipts());
possiblyMoveHead(block);

I'm wondering whether to remove possiblyMoveHead entirely now as I'm not sure we would ever rewind to the last saved block (should always match this if condition I think)...
Would be good to get a second opinion from @fab-10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code possiblyMoveHead could be removed, but I am still if thinking if this method was introduced to manage cases when there was a reorg, and still do not have a full answer, anyway the amount of tests that you have done make me confident

Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (blockchain.contains(head)) {
LOG.debug("Changing head to {}", head);
blockchain.rewindToBlock(head);
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also removed the blockchain.contains(hash) check. Any reason for that? Could possiblyMoveHead be called before we save the block?

@@ -128,6 +138,13 @@ protected Optional<List<Block>> processResponse(
// Clear processed headers
headers.clear();
}
LOG.atDebug()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be too noisy for debug level. Does this create a lot of logs? Might be better at the trace level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In backwards sync context, it's a single line per batch (max 200 blocks). I think single line per BWS batch is reasonable for debug, maybe listing 200 hashes in one line is a bit much though.

I have not considering if this task is used outside of backwards sync context, so will rethink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example:

{"@timestamp":"2023-05-19T07:44:58,628","level":"DEBUG","thread":"nioEventLoopGroup-3-5","class":"GetBodiesFromPeerTask","message":"Associated 115 bodies with 200 headers to get 115 blocks with these hashes: [0xa581c89e8006f754a2cdbb4e3865fd59ac9d6130ab21a16d5d0e8f706e21bc30, 0x1f131c3e95d9b4f78a917109901f3e3ecbbce9520fe2cfbf1733deee42d7e8e5, 0x91ed8c0f5115ec63ba073e8bf11e5856cc7d3d9db0a8105bf7aa760349b768ad, 0x8c5619c66be3dab809698e84de8da101ad8cf477504cef2fd3b72f24e6d6f54a, 0xbf332140d0951dd5d01795813499280c0dadfbf3552571b5a7d2a6cedbd1a5db, 0xa50935a301403d2aaf851c5d188716b1282efa2f1c79bb2e183fe3374ab7307c, 0x1a1dc2de4d10c1ad4ba704d66ebd998cd48992b5be9ccc95baa3a2b4fa5fe678, 0xd8e837589d740eb6c64e04aebc8035304d469301701f6bcf5d764878e37b9c1d, 0x6dfa35f17466743faa9ea6aa92745f79184cffaf4e364a659e1664223b2c4987, 0x1f6236f7529d303fd22a9e52df56862ef2772d7d4f107001c3636f0bc9b7fb91, 0xd304b678eea0f3867f211ced3303fcf5d6b895f404b7be53a182f81f4f612829, 0xdb89afdb0c1a10ac42a89c431a1441f2005b848b3e5881e43f034a4eda7a5257, 0x00854bddf02b950ca668c6ba7c305824c29176e62230b1a9754d431e33907a39, 0xe486b9bbfeaae405804d9f9a2cd8d70c0ceae4051b693ca1f7d202a8ad12458c, 0x5bdba83ee27fe3764fba8038d8e166ddf7b4346ce281b876431fd6a07ee947a1, 0x4e1648a532269400ece24d8ac289647310c54c2739baecbd1803406453ee67e1, 0xe21d2d355af70cc7654e00c37756fea5d5072267c19f153e0f850b388f63176d, 0x750fcb1d22697c636cdbaaf3dbb03d48347e663cea2457d2fdc9296442613c3b, 0x3b441965e617373e6f1f36885a22e8ca4a94d54c668ef996e86d7d3d98454355, 0x09cfa35441f7d2bcee08c5a6ab3f90011c54a8441272e5d65625c0c197cf0085, 0x7c81e63d1519cdd7b2211b70d78392c05a9a70dae10d5a48a91bf5cdeeaea96e, 0xb016d66512399c08933216673b2f33e3de0ffd739533ed05f698f455aea10b09, 0xd7b20a6ec1a16b46960e7ebefdf98d6a60ea09ade1cf5b1235d4f8a0bda300ce, 0x68e16fd9663d2d4c11c96f32b8d40d09016ed653b6838ae6347405156f9a8733, 0x1c3805a1592c5a6986636aae1e3b22f8554f83d75bee33a05a62c8eeeb55c408, 0x1cd4b9b7a96fa0c827beb5eeb0b67af3af0c39991be0492b21791c73b1977f89, 0x15d198b06bbfc017a1e743b475c8a978884b71e0149749bacd673b09155485d7, 0xd5305d13eb18c68a9b221986f418d55a7759dd339aa3c87d631fbb25d78299e4, 0x402028f66ac65d87404e74a17031c6cd16af78406704ca2870112331f1f318b8, 0x5fbf8bec2d8f8ecf27efe0bb9d95ef6c13af4fd5aa2599a53afd17bf5c7bc8c8, 0xe75a096d86fd1b3419472c55aa053ef231316f69e4d74d7462541e951f9f77dc, 0x60adc414f638dc63177f57dac2478c02e6413f2bc1dfc80be40f53f057372c5a, 0x25f4d56ecd836db24e4b7809673313ac1c35c22509b9032e26d16546a44374a1, 0x77ffd9b13e832abb38b12a5a4d91c715a243c3ee49ddeab603c50456d491f280, 0xe844dcae98df79029eb64c7581553b584944c6eecadb0d1fac2e5f4cb1c25971, 0xb89a4edbbe17c318cfb09e681e364d7e28a0c7309ace51310b30d148b4b1aa67, 0xa76d8127f1d803e5cbdc6b2c92c8f564b2b9ed54cd8db2595d8f647cf6a78ea3, 0x5c6a6aede955e3473c54c5a1c28d649c0d7452099cb3413cf2d0d220f73c6a2c, 0xffea7a23b3c43fb2b489a9e87f2d3c2457ad1d7b22e6ebdded015c3b6ff4fe9e, 0x85d900c9a2c386dbc23810a28168b2ddfa5c55138f2c59f042fe10f762cfe6ed, 0xa13394b891d86bfda335ac284208cf9d95e700870563a939cbeb197106141104, 0xe36d327166b3e877ea7e936d4b5fb28a2fd8d2619c781203b31394e50162a312, 0xaaf171611461dcdbccf46a1b24a0f4854c56742927fc14cd2ea74adf159ea69d, 0x5ce013fc908759386f605974f6bfcdd31d4023052ec11c465d2ce1803333c11d, 0x5b080b6ef641d2d669fe515de59637a95a752c38a49be4019e5fb3acc2a02188, 0x31133ff76cb75f2d2685ecea1e5a9a247261101986d9bd34550139788337fed5, 0x663e46591e5b011aab0a034c30fb52cd28828cf53b8b98d825c485743df4c5f0, 0x7146a32c598f451e531ba4dcaea977af78a5a49643c2da6f20317598b51f34e7, 0x9155e93ab1f9a8d28a5ab747afc89e7185d5e62deb7eb31dc44f24e90d18c4ae, 0x4167e1aa38583afb1294d20c42988d5971a950a4472eb7c1dcb64ddf79c4b0db, 0xed014895537f9f21c51aea4727c17b9199797c29bb8222e9e449c265ed75832e, 0xd40f4c5205c94e31dddfbba205a0a7d4d80956f6772b294b02ff21dca76e0679, 0x96e37c2045ad9a17bd343ceccd807d81a5f65effd9c00ed15fb282228d42d52f, 0x95d05790173515011123a806c786ce1dda2cc307ed1b499f40ccb7ac3762e3d6, 0x43708dc664bf0ddc241ab61ccf028363b537be6fcec9224c9de4f7339092c875, 0x15e73586d7868a0bd296be832040dc761d83eea8ec4bf9b90fe8d9770c3f6531, 0xc174e48efe4f2d739a5be2d0092789c9d4afe36d64d84bc17abd811c36c91064, 0xcff0cb3ebfb4ca51abacf748c6af951e1bf8c7ac3ee426292bc9467a37b479df, 0xf05e9111f026a97ca88c96d82c5ba7e167305b9c2e42ef8dbe9a55974e3a35c3, 0x6c3e8853383f246d1479e3b0111bcff0235d0dbfffe384b177cb55c7097d256c, 0xbed8c70c362da01efdcea7833d17f18f5bb520c6f30aea543e997f467289dd12, 0xfdb746d5604b2521d3521459e4dae6ace8cc5c5dfd3dea95e797f2117ef2a0a4, 0x131d1da9cd6474942c401bf5bdc1f7e1926b064ad5d671424078610edb2c0f59, 0x1164dce33cf33761294089993fcee9f2077d56d2894f75ecb74e3de577a88279, 0x09a3ef620a4215da7deed28d62fe2333a63833fcf63b4c01b46b5f03990345f7, 0x3a5c17f68b0c4069f1f3fa2363e28b774a8b4881cf9468274f063628babcc884, 0xdd914af3c646a53c7932237513165ee2e0a4d6b2ff0b5ccb3ac4f77ddb2cf93e, 0x8e1e0ae61cc95fedb0b6c2e97d12c92760a896f6f7f9147f6d96a45c301a61fb, 0x5b48aa0aff118a47b640c7f7620f422a4d011a9d1acbbbd960ad50e6ad214738, 0x7c774a7466bda226acc4c602b8c37c261569ac53c6d048bac52e35034622601c, 0xc8b4fc8395a0c7e7df16412dbff0f1c564e354ba97ca18dc40267bfd5afa62a8, 0xb098b4a327601d9778a709ec2f4a0a4ded8fef89c0ad3993bad7165c6d064583, 0xabf2c2df39eefcdb744bd004efb1fa9e94b154e67071b6f331928290450a3420, 0xa2a4109dfe9d8dc3416b60e432a68240d98749ceab5e5ab52840e943070372aa, 0x4b8515f540f871293a0da856ce88708d8ab62d9eff15da7a8a54b50393cc3ac3, 0x4872be7ea3e00c2471b39b89b5a629d7af21ea42aacf286e4f30e5c5b46e6914, 0xa1082cb18ed1c8e087577aadb537f9e4ebf762fc748c0ad08799172720bc8717, 0x8956971daab3cf9d936863f59bf5bcacaddd4f8588cba9344722bf0c20fb7695, 0xc7e1719c24402b5d2ffd1929df0cf0ff5e1e2a1697ed4a37d8be5db202e2485a, 0x19c9350a054dcbb88fe8738591f4504ac793fcfb0133b22d9c84a91773cd3f49, 0x625babcccedb59b9fc7c0742d4668a4357593320dbf93c791ff8e3ac6bb1a3de, 0x0c90873baf7975256e49f3112f27186dbf3691736200fc368daab3f5bb2ac709, 0x137616ca5b1fc2e5308083d64b19e95efe404127ff07fd392e906d4e412c141a, 0xf8e60dd6f80b3c48f76f53e7482ee69c1c7132f6bff125deba1fd28ca6c8fb76, 0x5c95c4d427d99486bae531a2659a75cc84185e8d30d1ce593a5aa55dc66935f5, 0x410867343202a76c0313c6e0f1cdbf831cc2b77af77f8fda26ec917483769efe, 0xb10480862f567485fb5fd7e5ef33b1cc2977d7f0d90302930297578336df8940, 0x3c1a03f9ed9771a9e14ec683cb7aa683065621d62ef33b62f9772c645bee0ad6, 0x8d4d768779446a6c49de849b61bd40237abf080c6834cbdc2635ccc0f05dd6d6, 0x69b82a64dda8732646b72b451323123c9af4eee39a996cfa72cfb8af1aa2c7a3, 0x8cf25c7d806456e82fe2ebc816ee4818cc3e1405857f3f480d1661f0c2d76a51, 0xdfd598d458f90107582091316156db07bd525a4980cfaf32a49f72ecd0359305, 0x68fb2a0b022831c5940b22fbfe31f77e940b3ce3e8d4b36feee7fb87e18593d4, 0xd833eeb9bcdadbfb6d3484681c97a531087f00d827c7face401a4b0f72e18bb9, 0x2fefd045e3d83d392256a5eb78b24501ed199fe0e2f36f0bf4f933da53ac3aa5, 0xc838b833310a5f53cb50b4edc9c1105fd95cd59bbfc9162ec59f59ef89d9a0cd, 0x34abafca91f5c32b3fed2cf19efc90f54265de7c4561a32cd3aaaae5e3ef3aef, 0x74cf886674e59f30a9ed62c2fb955f441e17e5c320d8ab877d20723c6ddd2c83, 0xe904cbb38089a91928d5c67207ec1d5eef2120a25450a7697f7dc144f69b1aba, 0x02c0b59b46da014a63d2327fc118c23cb5f2f2314c5355cb042341c5422a8e46, 0xd90685f7e17f49cf00c903d0af42775355fba320260ed000756311faf5e57686, 0xad0745a0d8f3e57cddd0ff3cd80a85862a71adee7f0c02e3775bc2676eb09926, 0x318055a0d35e21b1a0cef131b3389126bf794aec93ce45aa9cd275f82ca4bf99, 0xdb417ec856807c1d0b046e561b9469dd1cb2fa829a2df4ad2b44f244f5b4e304, 0x40e187eae307e95920b10d06f62cd1438dcd119ee1ced90633f119ec9e71cd65, 0xdfe1574cf4286282ef3090be6b6ab7520193bdcdf9f530b6fe972be10964978d, 0xbf6f6b77735bd56a47d468fff63664a6df55b68ea6febd2346dbaebefa1bd346, 0x9d16408cc2d98f710c1864e2b5d3c552d7cfdf0e3c1747160351092b4aa6fbd9, 0xd74f918fc549024a362e5770653cf4f383cb7ce951f0fb494938e5daeb404fee, 0xc6b8e328025517420a52a90fd7364e75d8e6b3d2982e3bd19c6163a5bf19d9e0, 0x0186fd9dc402914d031e01b272e552305bd4a9311b444ec4ede2a556269c21c2, 0xa5f6fe9d2e949de3348158b2716320247c0397ec78717a983de271806cf642d2, 0xdc8c1ed57e2287af318af110f305c1effbc534754c405252ac6cc30c84aaa291, 0x6437ac9c5325d524a65d0acfddb4588b78d936950a7e687d1c33c1fd9895fa8d, 0x0854cd3923ef76ef8d63ad44903f04d51f2c58a4800be6fec53a0a19be98e13e]","throwable":""}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I agree it's a bit much for DEBUG, I'll prob move to trace.
Another option could be to print first and last header of batch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved to trace for now. This also made me realise I had left the logging of the bodies in there which is pretty useless without the headers (which is why I created this debug log afterwards)

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@matkt
Copy link
Contributor

matkt commented Jun 5, 2023

not sure if it's a really important scenario but I don't see

Scenario : CL and Besu short downtime: 30 - 120 mins
Scenario : CL and Besu long downtime: 7 hours

Copy link
Contributor

@fab-10 fab-10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor log opts

maybeHead = Optional.empty();
} else {
maybeHead = Optional.of(headHash);
public synchronized void maybeUpdateTargetHeight(final Hash headHash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong this is redundant, and the method can be removed, since the update of the target height is done in the syncBackwardsUntil method, that is called just after

if (blockchain.contains(head)) {
LOG.debug("Changing head to {}", head);
blockchain.rewindToBlock(head);
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code possiblyMoveHead could be removed, but I am still if thinking if this method was introduced to manage cases when there was a reorg, and still do not have a full answer, anyway the amount of tests that you have done make me confident

siladu and others added 6 commits June 6, 2023 08:32
…anager/task/GetBodiesFromPeerTask.java


lazy log

Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…ync/backwardsync/BackwardSyncContext.java

Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…ync/backwardsync/BackwardSyncContext.java

Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Jun 13, 2023

not sure if it's a really important scenario but I don't see

Scenario : CL and Besu short downtime: 30 - 120 mins
Scenario : CL and Besu long downtime: 7 hours

Running these tests now with nimbus. I believe behaviour will be the same as if it's CL-only downtime since besu just sits there waiting to be driven by CL (and ocassionally serving peers)

@siladu
Copy link
Contributor Author

siladu commented Jun 14, 2023

not sure if it's a really important scenario but I don't see

Scenario : CL and Besu short downtime: 30 - 120 mins
Scenario : CL and Besu long downtime: 7 hours

Running these tests now with nimbus. I believe behaviour will be the same as if it's CL-only downtime since besu just sits there waiting to be driven by CL (and ocassionally serving peers)

@matkt Nimbus-besu worked for these tests.

For the short downtime, I did actually get a World State Root does not match expected value error which halted syncing but was solved with a restart of both clients.
I ran the test from the beginning a second time and didn't have this issue again.
I think this is a separate issue and was noticed during the original nimbus tests as well: #5411
At the moment, I suspect it could be related to processing out of order newPayload in quick succession.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu siladu merged commit da16402 into hyperledger:main Jun 14, 2023
@siladu siladu deleted the remove-maybeHead branch June 14, 2023 22:28
davidkngo pushed a commit to liquichain/besu that referenced this pull request Jun 28, 2023
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync.

Add more backwards sync logging.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
davidkngo pushed a commit to liquichain/besu that referenced this pull request Jul 21, 2023
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync.

Add more backwards sync logging.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
davidkngo added a commit to liquichain/besu that referenced this pull request Jul 21, 2023
elenduuche pushed a commit to elenduuche/besu that referenced this pull request Aug 16, 2023
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync.

Add more backwards sync logging.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync.

Add more backwards sync logging.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamGroot GH issues worked on by Groot Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants